An Empirical Study of Memory Hardware Errors in A Server Farm

نویسندگان

  • Xin Li
  • Michael C. Huang
  • Kai Shen
  • Lingkun Chu
چکیده

The integrity of system hardware is an important requirement for providing dependable services. Understanding the hardware’s failure mechanisms and the error rate is therefore an important step towards devising an effective overall protection mechanism to prevent service failure. In this paper we discuss an on-going case study of memory hardware failures of production systems in a server-farm environment. We present some preliminary results collected from 212 machines. Our observations under a normal, non-accelerated condition validate the existence of all failure modes modeled in the previous literature: single-cell, row, column, and whole-chip failures. We also provide a quantitative analysis of the error rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

mcelog: memory error handling in user space

Servers and high-performance computing systems contain more and more memory to handle bigger data sets. But with more and larger memory modules, and more transistors in them, combined with larger clusters of systems, the rate of memory errors in operation is also increasing. Modern server systems generally use ECC memory and other ways to detect and correct many memory errors in the hardware. W...

متن کامل

Measuring the Impact of Memory Errors on Application Performance

Memory reliability is a key factor in the design of warehousescale computers. Prior work has focused on the performance overheads of memory fault-tolerance schemes when errors do not occur at all, and when detected but uncorrectable errors occur, which result in machine downtime and loss of availability. We focus on a common third scenario, namely, situations when hard but correctable faults ex...

متن کامل

Heterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance

Recent studies estimate that server cost contributes to as much as 57% of the total cost of ownership (TCO) of a datacenter [1]. One key contributor to this high server cost is the procurement of memory devices such as DRAMs, especially for data-intensive datacenter cloud applications that need low latency (such as web search, in-memory caching, and graph traversal). Such memory devices, howeve...

متن کامل

An Empirical Analysis of Vertical Integration Determinants among Peasant Farmers in Northern Algeria

This study aims to analyze the determinants of vertical integration (ownership and contract-ing) among peasant farmers in Northern Algeria. The choice of asset control is between ownership and a simple contracting. Thus, the integration of vertical stages of agricultural produc-tion leads to higher gross margins, influences the choice of marketing and supply channels, and improves market partic...

متن کامل

Persistent Residual Increase in Server Processing Time

In this case study we present our observations of a query processing engine running at a server farm operated by one of our industrial partners. We examine the query engine response time (termed MSBFS) under a variety of conditions. Observations show that there is a persistent residual increase in the server processing time that is only reset with rebooting the hardware.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007